The Role of Prosody in the Marking of Information Structure in Korean Speakers of L2 English
Author: Suzy Park, Yonsei University
This article aims to review the processing of prosodic features in L2, with a special focus on Korean learners of English. After a brief discussion on the role of prosody in expressing information structure and L2 learners’ processing of prosodic cues, the effects of L1 interference for Korean learners will be examined, which stem from differences in prosodic realizations between Korean and English.
1. Prosody and Information Structure
Prosody is a very broad concept which encompasses various properties of speech that contribute to the proper understanding of speaker’s utterance and intentions. In particular, intrinsic prosody, which includes various suprasegmental features such as pitch, stress, rhythm, duration, and amplitude, has been known to play a crucial role at several stages of language processing, such as resolving syntactic ambiguity (e.g., Snedeker & Trueswell, 2003), identifying lexical boundaries (Brown et al., 2011), interpreting speaker’s intentions (Hellbernd & Sammler, 2016), and even enhancing memory recall (Schmidt et al., 2020). The current article concentrates on the studies of intrinsic prosody for its prominence in the literature, while also noting a growing number of studies on emotional or attitudinal prosody (e.g., Mitchell & Ross, 2013; Pell et al., 2009).
One interesting area that deals with how prosody influences language processing is the relationship between pitch accents and information structure. Information structure allows for the understanding of how concepts and references are organized and structured within a discourse context, such as givenness (new vs. given information), topic, and focus (Dimroth & Narasimhan, 2012). As prosodic structure is generally considered to be formed independently regardless of its lexical content (Ferreira, 1993), information structure is known to inform the prosodic structure of how it should be constructed in relation to relative importance of information being delivered (Cho, 2022). For example, the placement of different pitch accents (or lack thereof) might indicate whether a piece of information is given, new, or contrasted with other items in an alternative set.
Different pitch accents are associated with distinct meanings and functions (Pierrehumbert & Hirschberg, 1990) and subsequently, various experimental paradigms have been put in place to test whether people are actually sensitive to prosodic cues when discerning the discourse status of the various properties of an utterance. A now-famous study conducted by Ito and Speer (2008) asked participants to follow successive instructions in English on decorating a Christmas tree. They had to pick up an item which belonged to a set of identical items in different colors and the next instruction either carried a L+H* or H* accent on the color adjective. By utilizing eye-tracking, they found out that when the adjective received the L+H* accent, people expected the upcoming referent to be the same type in different color, rather than a completely different item. The L+H* accent was concluded to carry a contrastive meaning that elicits a contrastive set of possible items.
Contrastive pitch accents are also useful in eliciting pragmatic inferences, as shown in Kurumada et al. (2014). In a visual world paradigm, sentences such as “It looks/LOOKS like a zebra…” were provided in 2 conditions: either with a H* accent and a low boundary tone (not contrastive) or with a L+H* accent and a high boundary tone (contrastive). The contrastive condition would lead participants to infer that the referent is not actually a zebra and predict the competitor that appears to be similar. The presence of contrastive pitch accents facilitated online, incremental pragmatic processing that gives rise to such inferences.
On a similar note, Arnold (2008) also used the visual world paradigm to see whether participants were sensitive to the distinction between given – new information based on the accenting patterns. Indeed, when the target word was unaccented, there was a strong bias towards given information, where eye movements predicted the upcoming item to be the one that has been previously mentioned. On the other hand, bias towards new information was not as strong for accented conditions, which was attributed to the fact that accented information is not necessarily new every time, as given information that has not been recalled for a long time, for example, could also receive a prominent pitch accent. These results confirm that although prosodic cues help distinguish between different discourse status, the mapping is not always one-on-one, and can largely be probabilistic.
A recent study by Roettger et al. (2019) analyzed the mapping between prosodic forms and discourse structures in American English using three perception experiments. Sentences were produced with focus in one of the following four conditions (Broad, Narrow, Contrastive, or Given), and participants were asked to match the question to the correct response (Experiment 1), or the other way around (Experiment 2), with Experiment 3 testing both using a likert-scale instead of forced-choice tasks. The results revealed that performance level varied greatly depending on the focus condition. An interesting bias pattern was also found, as there existed biases against the broad focus context for any prosodic pattern, and a preference for the narrow focus context no matter what prosodic form was present. Given and contrastive focus conditions were matched correctly to the congruent contexts. Such discrepancies between conditions support the authors’ idea that prosody to function mapping is inherently probabilistic, and the reason why it is possible to predict or process intentions despite huge variability between speakers is that listeners make use of top-down probabilistic knowledge about whether the speaker will produce certain prosodic forms. As is shown from the results, there remains much ambiguity regarding the mapping between prosodic form and speaker intentions. Such vagueness, coupled with the language-specific nature of prosodic features, emphasizes the need for further clarification for the varying functions of prosodic cues, especially in understudied minority languages. In addition, the universality of information structure should also be firmly established for better comparison of prosodic processing of L2 speakers in terms of their understanding of discourse context.
2. Linguistic and Prosodic Processing of L2 Speakers
The question of whether L2 processing at various levels can converge to the mechanism of native speakers’ processing has been long debated in bilingualism literature. Various models thus have been proposed to account for differences in L1 and L2 processing, with conflicting predictions on whether it is possible for L2 speakers to process their second language similarly to native speakers as their proficiency increases. Some models do not rule out such possibility, examples including the RAGE Hypothesis (Grüter & Rohde, 2013) which states that L2 speakers’ reduced ability to generate expectations on upcoming referents may become more native-like as proficiency increases, or the Convergence Hypothesis (Green et al., 2006) which predicts more explicitly that neural representations of L2 can become highly similar to that of L1 as speakers become more proficient.
There are theories, on the other hand, that posit fundamental differences in L1 and L2 processing that cannot be overcome, even with increasing proficiency. The Shallow Structure Hypothesis (Clahsen & Felser, 2006) argues that L2 speakers tend to rely on lexical or semantic information than syntactic information compared to L1 speakers. Another model that assumes that processing difficulties cannot be overcome is the Interface Hypothesis (Sorace, 2011), which explains that L2 speakers generally find it harder when the interface between syntax and external domains such as pragmatics or discourse is involved. Although various tasks and paradigms have been utilized to test the question from the phonetic to lexical, semantic, and syntactic levels, the debate persists.
However, relatively few studies have investigated how L2 speakers’ understanding of prosodic cues is realized. Prosodic elements are known to be difficult for L2 speakers to learn, occasionally even being referred to as the “final hurdle” for advanced learners (Banjo, 1979). In order to master the production of prosodic features, learners must be able to integrate the knowledge of prosodic elements and how they interplay with phonetic realizations or syntactic structures. In addition, they must also be aware of what kinds of paralinguistic structures are employed to highlight certain parts of utterance or deaccent following information (Mennen & de Leeuw, 2014).
Studies that employ eye-tracking and visual world paradigms on various L2 populations have looked at whether L2 speakers are able to predict upcoming items in online language processing. Foltz (2021) found a delay in processing in L1 German - L2 English speakers when participants were asked to rely on the contrastive L+H* accent to restrict the set of upcoming referents, following instructions that require clicking of successive objects such as “Click on the red carrot. Click on the GREEN/green carrot.” Nakamura et al. (2020) tested the resolution of syntactic ambiguity in temporarily ambiguous sentences with prepositional phrases and concluded that L2 speakers of English were unable to make use of contrastive accents in visual scenes where there were two possible referents, in contrast to native speakers to whom contrastive accents played a facilitatory role in processing, suggesting that integration of various sources of information is difficult for L2 learners.
Similarly, Liu and Reed (2021) discovered that although there were no notable differences in processing information structure when syntactic structures are predictable and stable, L2 speakers did not process contrastive visual scenes as holistically as L1 speakers when the sentences were more structurally complex and less predictable. The authors attributed the cause of such differences to processing limitations of L2 speakers, which was previously suggested by O’Brien and Féry (2015) as the possible cause behind L2 speakers’ failure to process and produce L2 intonation similarly to L1 speakers. Based on these results, the dominating view is that L2 speakers can employ prosodic cues in language comprehension, but they may not do so in a predictive manner.
Other studies have looked at the varying methods with which focus is projected and the relative weights of focus cues for each language. For example, Yan and Calhoun (2020) have compared how prosodic prominence and cleft structures affect referent encoding in Chinese and English using false alternative rejection tasks. Prosodic prominence was found to be a stronger cue in Mandarin Chinese than in English, and the inhibitory effect of clefting was stronger in English, which suggests that the relative contribution of weighted cues for realizing focus is language-specific. In a subsequent study (Yan et al., 2022), the authors considered Mandarin Chinese learners’ usage of prosodic and clefting cues in interpreting focus in English, which is their L2. Contrary to L1 English speakers who favor clefting cues when prosody and syntactic cues do not match, L2 learners placed equal weight on the two types, likely due to transfer effects from their L1. These results emphasize the influence of language-specific features in L2 speakers’ processing of focus. Some view L1 as the filter through which perception of L2 is actualized, confirming the necessity of understanding the particular means by which prosodic prominence is achieved in the L1. The case of Korean will be explored in the next section.
3. Differences in Focus Realization Between Korean and English
As mentioned above, another important aspect to consider in L2 processing can be described as the degree of similitude between L1 and L2. It has been well established in previous studies that L2 processing in many levels can experience L1 transfer due to inherent differences between languages. English is a language in which pitch accents on lexical items can signal focus, which is often new or important information in the utterance. Words that receive focus are often longer in duration, greater in amplitude, followed by a longer pause, and accented by the contrastive pitch accent (L+H*) (e.g., Pierrehumbert & Hirschberg, 1990; Ladd, 2008). Korean, on the other hand, does not have lexical pitch accents or stress (e.g., Lee, 2015). Consequently, focus is usually achieved through syntactic structures such as clefting or word positions (e.g., Kember et al., 2021), or through Accentual Phrases (AP) (e.g., Jun, 1998), which are marked by tonal patterns (e.g., LHLH) of syllables and a phrase-final rising contour. In terms of prosodic boundaries, Korean is known to use dephrasing, a phenomenon in which AP boundaries are deleted when it is preceded by prominent elements (Cho, 2022). Certain postpositional markers such as -i/ka (nominative), -ul/lul (accusative), -nun (focus/topic) are representative examples of how information structure can be achieved through morphosyntactic features (Park & Yeon, 2023). Despite the attention to morphosyntax when signaling differences in information structure, there is a growing body of research that aims to uncover Korean speakers’ sensitivity to prosodic cues for focus, with evidence for the existence of prominence-induced prosodic strengthening (Kim et al., 2023).
4. Processing of English Focus Cues of Korean Learners of English
In regards to production, it has been found that Korean speakers of English often experience difficulties in producing proper pitch accents to project focus, presumably due to negative L1 transfer. Um et al. (2001) tested Korean speakers’ ability to produce sentences containing broad or narrow focus in English, replicating the experiment design by Birch and Clifton (1995). In experiment 1, Korean speakers were asked to produce sentences with varying patterns in pitch accents, with narrow focus on the subject, verb, direct object, or broad focus achieved by focus projection principles (Selkirk, 1994). Korean speakers were unable to differentiate between the conditions and added a H* or L+H* accent to every content word in all conditions. A replication with speakers with high proficiency who were aware of the different meanings also yielded identical results, signifying that advanced Korean learners of English who are knowledgeable in focus projection still find it difficult to realize it verbally.
A similar study conducted by Liu and Lee (2021) asked Korean learners of English of varying proficiency levels to produce random strings of digits in broad focus or contrastive focus conditions. Advanced level speakers produced prosodic cues similar to that of native speakers, but there was still considerable differences in post-focus compression, seemingly due to a lower pitch peak in the pitch accent position. Intermediate- and low-level speakers did not distinguish between different conditions in their production at all. Based on the results discussed above, it can be assumed that L1 transfer effects and the subsequent difficulties of processing focus and identifying information structures in L2 are greater if focus projection methods of L1 and L2 differ greatly.
Regarding perception, few studies have dealt with how Korean learners of English perceive lexical stress or accenting in English. Kim and Tremblay (2022) discovered that Korean speakers performed better at processing lexical stress in English compared to French speakers, possibly due to the presence of two tonal patterns (i.e., LHLH, HHLH) in the Accentual Phrase (AP) compared to French, which has one (i.e., LHiLH*). As intonation plays a role in segmentation in Korean, it is likely that the positive L1 transfer played a facilitatory role in discerning lexical contrast in English. Another study related to L2 speakers’ processing of focus structures looked at how Korean and Spanish learners of English discriminate English sentences with different lexical or sentential focus (Lee et al., 2019). Stress oddity tasks, where three words were presented (e.g., COMpact-inCITE-TRUsty) and participants had to choose the word with a different lexical stress pattern (e.g., inCITE) were followed by sentential focus oddity tests, where three sentences were provided and once again participants had to select the sentence with a different intonation pattern. Results revealed that although Korean learners of English performed worse at stress oddity tasks, they outperformed Spanish learners at detecting differences in sentential focus, suggesting that the use of F0 cues in Accentual Phrases in Korean allowed for better perception of sentence-level prominence cues. In addition, Korean learners were nevertheless still sensitive to lexical stress patterns in English, even though it is not a present feature in their L1. These results reveal although there are definite effects of L1 transfer, L2 speakers are capable of perceiving cues and features that are not present in their L1.
5. Conclusion and Further Directions
The varieties spoken by L2 learners could be understood as a complete system about which it is possible to dissect the mechanisms underlying it, rather than an imperfect variety of the target language. Keeping that in mind, further research in L2 prosody should investigate cross-linguistic differences in the prosodic system and how they may translate into varying magnitudes of L1 interference effects. At the same time, discovering the fundamental causes of the difficulties that L2 learners face which may result from motor constraints or influences from cognitive functions such as memory and inhibitory skills is another remaining task.
References:
Arnold, J. E. (2008). THE BACON not the bacon: How children and adults understand accented and unaccented noun phrases. Cognition, 108(1), 69-99.
Banjo, A. (1979). Beyond intelligibility in Nigerian English. Varieties and functions of English in Nigeria, 7-13.
Birch, S., & Clifton Jr, C. (1995). Focus, accent, and argument structure: Effects on language comprehension. Language and speech, 38(4), 365-391.
Brown, M., Salverda, A. P., Dilley, L. C., & Tanenhaus, M. K. (2011). Distal prosody influences lexical interpretation in online sentence processing. In Proceedings of the Annual Meeting of the Cognitive Science Society (Vol. 33, No. 33).
Cho, T. (2022). Linguistic functions of prosody and its phonetic encoding with special reference to Korean. Japanese/Korean Linguistics, 29.
Clahsen, H., & Felser, C. (2006). Continuity and shallow structures in language processing. Applied Psycholinguistics, 27(1), 107-126.
Dimroth, C., & Narasimhan, B. (2012). The acquisition of information structure. The expression of information structure, 319-362.
Ferreira, F. (1993). Creation of prosody during sentence production. Psychological review, 100(2), 233.
Foltz, A. (2021). Using prosody to predict upcoming referents in the L1 and the L2: The role of recent exposure. Studies in Second Language Acquisition, 43(4), 753-780.
Green, D. W., Crinion, J., & Price, C. J. (2006). Convergence, degeneracy, and control. Language learning, 56, 99-125.
Grüter, T., & Rohde, H. (2013). L2 processing is affected by RAGE: Evidence from reference resolution. In the 12th conference on Generative Approaches to Second Language Acquisition (GASLA).
Hellbernd, N., & Sammler, D. (2016). Prosody conveys speaker’s intentions: Acoustic cues for speech act perception. Journal of Memory and Language, 88, 70-86.
Ito, K., & Speer, S. R. (2008). Anticipatory effects of intonation: Eye movements during instructed visual search. Journal of memory and language, 58(2), 541-573.
Jun, S. A. (1998). The accentual phrase in the Korean prosodic hierarchy. Phonology, 15(2), 189-226.
Kember, H., Choi, J., Yu, J., & Cutler, A. (2021). The processing of linguistic prominence. Language and Speech, 64(2), 413-436.
Kim, D. J., Kim, O., & Park, H. (2023). Prosodic realization of identificational and contrastive focus in Korean multiple accusative constructions. Glossa: a journal of general linguistics, 8(1).
Kim, H., & Tremblay, A. (2022). Intonational Cues to Segmental Contrasts in the Native Language Facilitate the Processing of Intonational Cues to Lexical Stress in the Second Language. Frontiers in Communication, 7, 845430.
Kurumada, C., Brown, M., Bibyk, S., Pontillo, D. F., & Tanenhaus, M. K. (2014). Is it or isn’t it: Listeners make rapid use of prosody to infer speaker meanings. Cognition, 133(2), 335-342.
Ladd, D. R. (2008). Intonational phonology. Cambridge University Press.
Lee, G. (2015). Production and perception of Korean and English word-level prominence by Korean speakers (Doctoral dissertation, University of Kansas).
Lee, G., Shin, D. J., & Garcia, M. T. M. (2019). Perception of lexical stress and sentence focus by Korean-speaking and Spanish-speaking L2 learners of English. Language Sciences, 72, 36-49.
Liu, D., & Reed, M. (2021). Exploring the complexity of the L2 intonation system: An acoustic and eye-tracking study. Frontiers in Communication, 6, 627316.
Liu, J., & Lee, Y. C. (2022). Focus prosody by Korean learners of English. Linguistic Approaches to Bilingualism, 12(6), 748-777.
Mennen, I., & de Leeuw, E. (2014). Beyond segments: Prosody in SLA. Studies in Second Language Acquisition, 36(2), 183-194.
Mitchell, R. L., & Ross, E. D. (2013). Attitudinal prosody: What we know and directions for future study. Neuroscience & Biobehavioral Reviews, 37(3), 471-479.
Nakamura, C., Arai, M., Hirose, Y., & Flynn, S. (2020). An extra cue is beneficial for native speakers but can be disruptive for second language learners: Integration of prosody and visual context in syntactic ambiguity resolution. Frontiers in Psychology, 10, 2835.
O’Brien, M. G., & Féry, C. (2015). Dynamic localization in second language English and German. Bilingualism: Language and Cognition, 18(3), 400-418.
Park, C., & Yeon, J. (2023). Information structure in Korean: What's new and what's old?. Journal of Pragmatics, 205, 16-32.
Pell, M. D., Monetta, L., Paulmann, S., & Kotz, S. A. (2009). Recognizing emotions in a foreign language. Journal of Nonverbal Behavior, 33, 107-120.
Pierrehumbert, J., & Hirschberg, J. B. (1990). The meaning of intonational contours in the interpretation of discourse. Intentions in Communication, MIT Press.
Roettger, T. B., Mahrt, T., & Cole, J. (2019). Mapping prosody onto meaning–the case of information structure in American English. Language, Cognition and Neuroscience, 34(7), 841-860.
Schmidt, E., Pérez, A., Cilibrasi, L., & Tsimpli, I. (2020). Prosody facilitates memory recall in L1 but not in L2 in highly proficient listeners. Studies in Second Language Acquisition, 42(1), 223-238.
Selkirk, E. (1995). Sentence prosody: Intonation, stress, and phrasing. The Handbook of Phonological Theory, 1, 550-569.
Snedeker, J., & Trueswell, J. (2003). Using prosody to avoid ambiguity: Effects of speaker awareness and referential context. Journal of Memory and language, 48(1), 103-130.
Sorace, A. (2011). Pinning down the concept of “interface” in bilingualism. Linguistic approaches to bilingualism, 1(1), 1-33.
Um, H. Y., Lee, H. S., & Kim, K. H. (2001). Korean speakers' realization of focus and information structure on English intonation in comparison with English native speakers. Speech Sciences, 8(2), 133-148.
Yan, M., & Calhoun, S. (2020). Rejecting false alternatives in Chinese and English: The interaction of prosody, clefting, and default focus position. Laboratory Phonology, 11(1).
Yan, M., Warren, P., & Calhoun, S. (2022). Focus interpretation in L1 and L2: The role of prosodic prominence and clefting. Applied Psycholinguistics, 43(6), 1275-1303.